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Abstract 

Physics analysis in astroparticle experiments requires the capability of recognizing 
new phenomena; in order to establish what is new, it is important to develop tools 
for automatic classification, able to compare the final result with data from differ- 
ent detectors. A typical example is the problem of Gamma Ray Burst detection, 
classification, and possible association to known sources: for this task physicists will 
need in the next years tools to associate data from optical databases, from satellite 
experiments (EGRET, GLAST), and from Cherenkov telescopes (MAGIC, HESS, 
CANGAROO, VERITAS). 
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1 Introduction 



Clustering of features is an important problem in many physics experiments. 
Such an analysis task can be performed: 

• in a supervised way, when the analyst has some examples, for which the cor- 
rect classification is known. This can be done, for example, in most problems 
related to particle physics at accelerators, where there is a generally good 
knowledge of detectors and of the underlying physics, and good simulations 
are available. 

• in an unsupervised way, when the events are partitioned into classes of 
similar elements, without using additional information. This is the case 
especially for fields operating in a discovery regime, as, e.g., astroparticle 
physics. 
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The idea of automatic classification is not new in particle and astroparticle 
physics. Cleaning up the signal and separating concurrent signals when nonlin- 
ear effects and high-order correlations are important is a standard in particle 
physics since the analysis of the branching fraction of the Z boson into bb pairs 
by DELPHI [1]. 

An important literature exists for the use of automatic classifiers in astropar- 
ticle physics (see for example [2] and references therein). Such a classification 
was mostly done with the use of Multilayer Perceptrons, while the bulk of 
the works based on unsupervised classification uses Independent Component 
Analysis (see for example [3] and references therein). Studies based on Self- 
Organized Maps and Growing Self-Organizing Networks [4,5,7,8,9] have re- 
cently started [10], but a general framework for multiwavelength classification 
is still missing. 



2 A case study in astroparticle physics 



Gamma-ray astroparticle physics is a relatively new science; it has as a coun- 
terpart optical astrophysics, one of the oldest sciences. Many of the objects we 
observe in the gamma sky, sensitive to the phenomena of high-energy physics, 
have an optical counterpart or clear relations to optical objects. Finding what 
is a signature of a new phenomenon requires the ability to classify observa- 
tions, and the ability to recognize what is not new. 

Astrophysical databases contain large amounts of data; one example is given 
by the growing number of experiments studying Gamma Ray Bursts (GRBs). 
Data sets can be found in several archives (sec e.g. Ref. [11]). 

Large datasets are available from systematic sky surveys. The size of such 
databases is now of the order of 10^^ bytes, but in the near future it will 
grow by three orders of magnitude thanks to the technological development 
of telescopes and detectors. Surveys are done on a wide energy range (from 
10~^ to 10^^ eV), and they are heterogeneous (mission-oriented, platform and 
instrument dependent). The attributes registered are variable (polarization 
etc.); numerical simulations have to be matched to real data. 

Such a complexity poses nontrivial data management issues (see [12]); more- 
over, we need uniform interfaces to access complex data. A few projects started 
in the last years with the simple purpose of making the data readable. 
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3 The project at the University of Udine 

At the University of Udine we are developing a project involving data orga- 
nization, data mining and analysis tools for the analysis of gamma sources 
(Gamma Ray Bursts in particular: most of the EGRET sources were uniden- 
tified). 

The sources detected by GLAST [14] and MAGIC [15] will be compared with 
existing databases to detect what is new. What is new can be then classified 
based on an unsupervised classifier. 

Another important analysis tool is a powerful visualization package: the idea 
is to visually present many variables together offering a degree of control over 
a number of different visual properties. High dimensionality of data set and 
visual properties such as color, size can be added to the position property 
for proper visualization purposes. Multiple views can be used by linking all 
separate views together when the use of these properties makes it difficult. 



3.1 Classification of GRBs 

The kernel of the analysis is the strategy for the classification. With the grow- 
ing number of experiments dedicated to GRBs [16] it is essential to optimize 
the techniques for the complex task of classification. Artificial Intelligence- 
(AI-) based pattern recognition algorithms are one possible candidate: auto- 
mated linear classification of vector data into a given number (or an arbitrary 
number) of classes is a well established technique in the field of machine learn- 
ing. Several varieties of Al-based classifiers exist [10]. 

Clustering is the unsupervised classification of patterns [6] (observations, data 
items or feature vectors) into groups called clusters. Clustering is useful in 
several exploratory pattern analysis, grouping, decision making and machine 
learning situations including data mining, document retrieval, image segmen- 
tation and pattern classification. 

Self- Organising Neural Networks [4,5,7,8,9] are often used to cluster input 
data. Similar patterns are grouped by the network and are represented by a sin- 
gle unit. This grouping is done automatically on the basis of data correlations. 
Well-known examples of Self-Organising Artificial Neural Networks (ANN) 
used for clustering include Kohonen's self-organising maps. Self- Organising 
Tree Algorithm (SOTA), Growing CeU Structures (GCS). 

In our prototype, Self-Organizing Maps (SOM) were used. 
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4 Reseeirch Perspectives 

One promising area where the potential of self-organizing networks has not 
been fully exploited is certainly data mining and knowledge discovery. Clus- 
tering huge data sets without knowing in advance the number of clusters is 
something such strategy should excel at. 

Making hybrid neural networks (combining various self -organizing networks) 
can result in an efficient clustering. 

Visualisation has an important role in cluster analysis . Advanced Visualisa- 
tion techniques [13] such as Galaxies, Correlation Tool, OmniViz Pro, Hyper- 
cube, play an important role in analyzing clusters. Integrating these techniques 
with neural networks can provide interesting results. 

GRB classification [10] could be an case study to use as a benchmark. Possible 
applications could be tested on data sets from the GRB catalogs, for example 
using light curves or band-spectral parameters. 

Separation of gamma from hadrons is another important and difficult problem 
in Gamma-Ray experiments. The classification problem has been addressed 
with supervised neural networks. The network separation is based on the study 
of simulated data. It is very likely that severe adjustments have to be made to 
the simulation to better reflect the data, and the network training has to be 
redone with the improved simulation. The disadvantage of this approach is the 
output ambiguity and the network should be refined constantly to improve the 
separation of the output. Applying Self-Organizing Networks would be useful 
as the classification could be automatic and model-independent. 

The final research perspective is a hbrary of Science Tools for AstroParticle 
Physics. Such library should include tools for data mining, tools for optimizing 

the features selection (physical characteristics which can be extracted from 
different detectors, in particular GLAST, MAGIC, and X-ray detectors like 
INTEGRAL, CHANDRA, SWIFT), and a powerful visualization package. 
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